Arabic Morphology Using Only Finite-State Operations
نویسنده
چکیده
Finite-state morphology has been successful in the description and computational implementation of a wide variety of natural languages. However, the particular challenges of Arabic, and the limitations of some implementations of finite-state morphology, have led many researchers to believe that finite-state power was not sufficient to handle Arabic and other Semitic morphology. This paper illustrates how the morphotactics and the variation rules of Arabic have been described using only finitestate operations and how this approach has been implemented in a significant morphological analyzer/generator. 1 I n t r o d u c t i o n In Arabic, as in other natural languages, the two challenges of morphological analysis are the description of 1) the morphotactics and 2) the variation rules. Morphotactics is the study of how morphemes combine together to make wellformed words. Variations are the discrepancies between the underlying or morphophonemic strings and their surface realization, which are either phonological or orthographical strings depending on the purpose of the grammar. The key insight and claim of the finite-state approach to morphology (Karttunen, 1991; Karttunen et al., 1992; Karttunen, 1994)is that both morphotactics and variation grammars can be written as regular expressions, which are compiled and implemented on computers as finite-state automata. Such grammars are interesting theoretically because they are highly constrained; and in practical computational linguistics for natural languages, finitestate automata are fast, usually compact in size, bidirectional, combinable using all valid finite-state operations, and consultable using language-independent lookup code. Finite-state approaches to morphology, including the readily available implementation known as Two-Level Morphology (Koskenniemi, 1983; Antworth, 1990), have been shown to work in significant projects for French, English, Spanish, Portuguese, Italian, Finnish, Turkish and a wide variety of other natural languages. But despite the high attractiveness of finitestate computing, many investigators have concluded that finite-state techniques are not adequate for describing Semitic root-and-pattern morphology. This paper will present the case that fully implemented finite-state morphology can be and has been used successfully for Arabic. 2 R e g u l a r E x p r e s s i o n s When writing a finite-state morphological grammar, linguists state morphotactic and variation rules in the metalanguage of regular expressions or in higher-level languages that are convenient shorthand notations for complex regular expressions. 2.1 Regular Expressions, Regular Relat ions, and Fin l te -Sta te Transducers A regular expression that contains an alphabet of one-level symbols defines a regular language and compiles into a finite-state machine (FSM) that accepts this regular language. A regular expression that contains an alphabet of paired symbols defines a regular relation (a relation between two regular languages) and compiles into a finite-state transducer (FST) that maps from every string of one language into strings of the other. H the necessary finite-state algorithms and compilers are available, components of the grammar, including various sublexicons
منابع مشابه
Arabic Morphology Parsing Revisited
In this paper we propose a new approach to the description of Arabic morphology using 2-tape finite state transducers, based on a particular and systematic use of the operation of composition in a way that allows for incremental substitutions of concatenated lexical morpheme specifications with their surface realization for non-concatenative processes (the case of Arabic templatic interdigitati...
متن کاملOn Abstract Finite-State Morphology
Aspects of abstract finite-state morphology are introduced and demonstrated. The use of two-way finite automata for Arabic noun stem and verb root inflection leads to abstractions based on finite-state transition network topology as well as the form and content of network arcs. Nonconcatenative morphology is distinguished from concatenative morphology by its use of movement on the output tape r...
متن کاملImplementing Urdu Grammar as Open Source Software
Urdu is a challenging language because of, first, its Perso-Arabic script, second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia and third, its pragmatically neutral constituent order (SOV Subject Object Verb). Today, the state of art technology to write grammars (morphology + syntax) is to use specialpurpose ...
متن کاملApplication of Skipsm to Binary Morphology
This paper describes the application of SKIPSM (Separated-Kernel Image Processing using Finite State Machines) to binary morphology. In comparison with conventional hardware-based and software-based approaches, SKIPSM allows implementation at higher speeds and/or lower hardware cost. The key theoretical developments upon which this improved performance is based are the separation of 2-D binary ...
متن کاملAn Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modelling Finite State Networks
Morphological ambiguity is a major concern for syntactic parsers, POS taggers and other NLP tools. For example, the greater the number of morphological analyses given for a lexical entry, the longer a parser takes in analyzing a sentence, and the greater the number of parses it produces. Xerox Arabic Finite State Morphology and Buckwalter Arabic Morphological Analyzer are two of the best known,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998